EusBila, a search service designed for the agglutinative nature of Basque

نویسندگان

  • Igor Leturia
  • Antton Gurrutxaga
  • Nerea Areta
  • Iñaki Alegria
  • Aitzol Ezeiza
چکیده

The performance of major search engines for Basque is far from satisfactory, partly due to the agglutinative nature of the language –it is commonly known that search engines do not perform well with such languages– and partly because it is not a language to which search engines restrict their results. In this paper we present EusBila, a search service for Basque that relies on the APIs of search engines, yet obtains a lemma-based and language-filtered search by means of morphological query expansion and language-filtering words. It is a cost-effective approach, which we think can be used for other agglutinative or minority languages. We also evaluate how well EusBila performs when carrying out a Basque query, and we compare this performance to that of a major search engine in terms of precision and recall, thus demonstrating that EusBila is a very valid solution.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

LSA learner sentence comprehension in agglutinative and non-agglutinative languages

This work has been carried out in the context of automatic evaluation of learner summaries where text comprehension is gained using Latent Semantic Analysis (LSA) and Natural Language Processing (NLP) techniques. We had intuitively observed that lemmatized versions of LSA matrixes resembled better human Basque similarity judgement than the non lemmatized ones. This research was conducted to tes...

متن کامل

Building the Gold Standard for the Surface Syntax of Basque

In this paper, we present the process in the construction of SF-EPEC, a 300,000-word corpus syntactically annotated that aims to be a Gold Standard for the surface syntactic processing of Basque. First, the tagset designed for this purpose is described; being Basque an agglutinative language, sometimes complex syntactic tags were needed. We also account for the different phases in the construct...

متن کامل

Using Finite State Technology in Natural Language Processing of Basque

This paper describes the components used in the design and implementation of NLP tools for Basque. These components are based on finite state technology and are devoted to the morphological analysis of Basque, an agglutinative pre-Indo-European language. We think that our design can be interesting for the treatment of other languages. The main components developed are a general and robust morph...

متن کامل

Coreference Resolution for Morphologically Rich Languages. Adaptation of the Stanford System to Basque

This paper presents the adaptation of the Stanford coreference resolution system to Basque, an agglutinative head-final pro-drop language. The adapted system has been integrated into a global linguistic analysis pipeline so that the input of the system are original Basque raw texts linguistically processed, and annotated. We demonstrate that language-specific characteristics have a noteworthy e...

متن کامل

Combining Stochastic and Rule-Based Methods for Disambiguation in Agglutinative Languages

In this paper we present the results of the combination of stochastic and rule-based disambiguation methods applied to Basque languagel. The methods we have used in disambiguation are Constraint Grammar formalism and an HMM based tagger developed within the MULTEXT project. As Basque is an agglutinative language, a morphological analyser is needed to attach all possible readings to each word. T...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007